Hierarchical Machine Translation With Discontinuous Phrases
نویسنده
چکیده
We present a hierarchical statistical machine translation system which supports discontinuous constituents. It is based on synchronous linear context-free rewriting systems (SLCFRS), an extension to synchronous context-free grammars in which synchronized non-terminals span k ≥ 1 continuous blocks on either side of the bitext. This extension beyond contextfreeness is motivated by certain complex alignment configurations that are beyond the alignment capacity of current translation models and their relatively frequent occurrence in hand-aligned data. Our experiments for translating from German to English demonstrate the feasibility of training and decoding with more expressive translation models such as SLCFRS and show a modest improvement over a context-free baseline.
منابع مشابه
Source-Side Discontinuous Phrases for Machine Translation: A Comparative Study on Phrase Extraction and Search
Standard phrase-based statistical machine translation systems generate translations based on an inventory of continuous bilingual phrases. In this work, we extend a phrase-based decoder with the ability to make use of phrases that are discontinuous in the source part. Our dynamic programming beam search algorithm supports separate pruning of coverage hypotheses per cardinality and of lexical hy...
متن کاملAccurate Non-Hierarchical Phrase-Based Translation
A principal weakness of conventional (i.e., non-hierarchical) phrase-based statistical machine translation is that it can only exploit continuous phrases. In this paper, we extend phrase-based decoding to allow both source and target phrasal discontinuities, which provide better generalization on unseen data and yield significant improvements to a standard phrase-based system (Moses). More inte...
متن کاملEffective Use of Discontinuous Phrases for Hierarchical Phrase-based Translation
Hierarchical phrase-based (HPB) models have shown strong capability in generalization and reordering. However, they are heavily dependent on continuous phrases and are difficult for modeling natural linguistic discontinuities directly. In this paper, we propose a novel approach for integrating discontinuous phrases into the Chinese-to-English HPB system. We focus on the extraction method of dis...
متن کاملA Dictionary Lookup Strategy for Translating Discontinuous Phrases
Translation of discontinuous phrases is a major challenge in Machine Translation. Within METIS-II we developed a dictionary lookup strategy by mapping the items of a dictionary entry on non-adjacent words in an input text. Mapping is controlled through so-called contextual rejection, i.e. inappropriate mappings are discarded if they fail to satisfy a predefined set of constraints. We present va...
متن کاملAnalysing soft syntax features and heuristics for hierarchical phrase based machine translation
Similar to phrase-based machine translation, hierarchical systems produce a large proportion of phrases, most of which are supposedly junk and useless for the actual translation. For the hierarchical case, however, the amount of extracted rules is an order of magnitude bigger. In this paper, we investigate several soft constraints in the extraction of hierarchical phrases and whether these help...
متن کامل